How accurate are the extremely small P-values used in genomic research: An evaluation of numerical libraries

نویسندگان

  • Sai Santosh Bangalore
  • Jelai Wang
  • David B. Allison
چکیده

In the fields of genomics and high dimensional biology (HDB), massive multiple testing prompts the use of extremely small significance levels. Because tail areas of statistical distributions are needed for hypothesis testing, the accuracy of these areas is important to confidently make scientific judgments. Previous work on accuracy was primarily focused on evaluating professionally written statistical software, like SAS, on the Statistical Reference Datasets (StRD) provided by National Institute of Standards and Technology (NIST) and on the accuracy of tail areas in statistical distributions. The goal of this paper is to provide guidance to investigators, who are developing their own custom scientific software built upon numerical libraries written by others. In specific, we evaluate the accuracy of small tail areas from cumulative distribution functions (CDF) of the Chi-square and t-distribution by comparing several open-source, free, or commercially licensed numerical libraries in Java, C, and R to widely accepted standards of comparison like ELV and DCDFLIB. In our evaluation, the C libraries and R functions are consistently accurate up to six significant digits. Amongst the evaluated Java libraries, Colt is most accurate. These languages and libraries are popular choices among programmers developing scientific software, so the results herein can be useful to programmers in choosing libraries for CDF accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

برآورد صحت انتخاب ژنومی در جوامع کوچک ژنتیکی- مطالعه‌ شبیه‌سازی

In the present study two genetically connected small and large populations were simulated and the effect of different sources of information from foreign populations on the accuracy of predicted genomic breeding values of young animals of the small population was investigated. A large population consist of 200000 animals over 15 generations and a small population consist of 5000 animals over 3 ...

متن کامل

بررسی نیروهای هیدرودینامیکی وارد بر خطوط لوله فراساحلی مرکب تحت جریان‌های دائمی

The oil and gas pipelines are considered as one of the most important offshore structures. In this study, the turbulent flow past piggyback pipelines and vortex shedding are numerically simulated under steady currents. Numerical simulation has been conducted for different values of flow velocities, diameter ratios, and gap ratios; in order to investigate how the hydrodynamic forces of the small...

متن کامل

Examination and Evaluation of the Use of Toys in Iran Public Libraries

Abstract Purpose: Due to the importance of using toys in libraries and its unreasonable role in the development of individual and social personality of children, today many libraries around the world, as well as in Iran by the Iran Public Libraries Foundation, use this possibility. Continuation of the process of sending toys by the Iran Public Libraries Foundation requires a thorough review of...

متن کامل

Presenting a Framework for Supporting Life-long Learning in Iranian public libraries and Its validation

Purpose: Since nowadays public libraries are considered lifelong learning centers, these centers must have the required standards and conditions to support lifelong learning in order that they could help society members to achieve their personal and professional learning more effectively. Accordingly, it is necessary to develop and provide a mechanism to support lifelong learning in public libr...

متن کامل

Numerical Analysis of Stress Distribution During Tunneling in Clay Stone Rock

Modern technology has been used to build tunnels in recent years by means of drilling machines (TBM) that were used for civil engineering work in large cities to reduce the harmful effects of spending on the surface of the earth significantly. To build the tunnel, numerical modeling was used on the basis of the finite element method to predict stress behavior during the tunnel construction proc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational statistics & data analysis

دوره 53 7  شماره 

صفحات  -

تاریخ انتشار 2009